5 research outputs found
Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency
Given a social network G and a constant k, the influence maximization problem
asks for k nodes in G that (directly and indirectly) influence the largest
number of nodes under a pre-defined diffusion model. This problem finds
important applications in viral marketing, and has been extensively studied in
the literature. Existing algorithms for influence maximization, however, either
trade approximation guarantees for practical efficiency, or vice versa. In
particular, among the algorithms that achieve constant factor approximations
under the prominent independent cascade (IC) model or linear threshold (LT)
model, none can handle a million-node graph without incurring prohibitive
overheads.
This paper presents TIM, an algorithm that aims to bridge the theory and
practice in influence maximization. On the theory side, we show that TIM runs
in O((k+\ell) (n+m) \log n / \epsilon^2) expected time and returns a
(1-1/e-\epsilon)-approximate solution with at least 1 - n^{-\ell} probability.
The time complexity of TIM is near-optimal under the IC model, as it is only a
\log n factor larger than the \Omega(m + n) lower-bound established in previous
work (for fixed k, \ell, and \epsilon). Moreover, TIM supports the triggering
model, which is a general diffusion model that includes both IC and LT as
special cases. On the practice side, TIM incorporates novel heuristics that
significantly improve its empirical efficiency without compromising its
asymptotic performance. We experimentally evaluate TIM with the largest
datasets ever tested in the literature, and show that it outperforms the
state-of-the-art solutions (with approximation guarantees) by up to four orders
of magnitude in terms of running time. In particular, when k = 50, \epsilon =
0.2, and \ell = 1, TIM requires less than one hour on a commodity machine to
process a network with 41.6 million nodes and 1.4 billion edges.Comment: Revised Sections 1, 2.3, and 5 to remove incorrect claims about
reference [3]. Updated experiments accordingly. A shorter version of the
paper will appear in SIGMOD 201
An efficient algorithm for mapping vehicle trajectories onto road networks
Modern mobile technology has enabled the collection of large scale vehicle trajectories using GPS devices. As GPS measurements may come with error, vehicle trajectories are often noisy. A common practice to alleviate this issue is to apply map-matching, i.e., to align vehicle trajectories with the road segments in a digitized road network. This paper presents an efficient solution for map-matching problem that won the SIGSPATIAL CUP 2012. Given a road network, our solution first constructs a gird index on the road segments. For each point p on a vehicle trajectory, we employ the index to identify a candidate set of road segments that are close to p, and then we refine the candidate set to select a segment that matches p with the highest probability. The selection of the best match is based on a metric that takes into account (i) the correlation between consecutive GPS measurements as well as (ii) the directions and shapes of the road segments. Experimental results on real vehicle trajectories and road networks demonstrate the effectiveness and efficiency of the proposed solution
HubPPR: Effective Indexing for Approximate Personalized PageRank
Personalized PageRank (PPR) computation is a fundamental operation in web search, social networks, and graph analysis. Given a graph G, a source s, and a target t, the PPR query Î (s, t) returns the probability that a random walk on G starting from s terminates at t. Unlike global PageRank which can be effectively pre-computed and materialized, the PPR result depends on both the source and the target, rendering results materialization infeasible for large graphs. Existing indexing techniques have rather limited effectiveness; in fact, the current state-of-the-art solution, BiPPR, answers individual PPR queries without pre-computation or indexing, and yet it outperforms all previous index-based solutions.
Motivated by this, we propose HubPPR, an effective indexing scheme for PPR computation with controllable tradeoffs for accuracy, query time, and memory consumption. The main idea is to pre-compute and index auxiliary information for selected hub nodes that are often involved in PPR processing. Going one step further, we extend HubPPR to answer top-k PPR queries, which returns the k nodes with the highest PPR values with respect to a source s, among a given set T of target nodes. Extensive experiments demonstrate that compared to the current best solution BiPPR, HubPPR achieves up to 10x and 220x speedup for PPR and top-k PPR processing, respectively, with moderate memory consumption. Notably, with a single commodity server, HubPPR answers a top-k PPR query in seconds on graphs with billions of edges, with high accuracy and strong result quality guarantees.MOE (Min. of Education, S’pore)Published versio
Distribution of Triarrhena lutarioriparia and its reserve characteristics of nitrogen and phosphorus in Dongting Lake
Triarrhena lutarioriparia, a typical and most abundant macrophyte in Dongting lake wetland, was in the state of abandonment following the papermaking industry revocation in the lake basin. In order to provide scientific basis for precise management of T. lutarioriparia, the T. lutarioriparia distribution charateristics in Dongting Lake and its storage characteristics of nutrients were investigated in this study. Remote sensing interpretation results showed that the total area of T. lutarioriparia in Dongting Lake wetland was 58, 450 ha, 48.31% of which distributed in South Doting Lake wetlands. The nutrients contents were significantly different in T. lutarioriparia tissues, ranking in the descending order of spikes (TN 27.90 mg/g, TP 3.46 mg/g)>leaves (TN 16.38 mg/g, TP 2.11 mg/g)>stems (TN 5.38 mg/g, TP 0.85 mg/g). The total P quantities in each T. lutarioriparia tissue were ranked in the order: stems (560.26 t)>leaves (396.52 t)>spikes (284.67 t), while the total N quantities were within the range of 2170.02-2801.3 t. It was estimated that about 7712.99 t of TN and 1241.45 t of TP were annually removed from Dongting Lake by reaping T. lutarioriparia. The nutrients stored in the dead tissues of T. lutarioriparia might possess non-negligible impact on the water quality of Doting Lake